Forced Derivations for Hierarchical Machine Translation

نویسندگان

Stephan Peitz

Arne Mauser

Joern Wuebker

Hermann Ney

چکیده

We present an efficient framework to estimate the rule probabilities for a hierarchical phrasebased statistical machine translation system from parallel data. In previous work, this was done with bilingual parsing. We use a more efficient approach splitting the bilingual parsing into two stages, which allows us to train a hierarchical translation model on larger tasks. Furthermore, we apply leave-one-out to counteract over-fitting and use the expected count from the inside-outside algorithm to prune the rule set. On the WMT12 Europarl German→English and French→English tasks, we improve translation quality by up to 1.0 BLEU and 0.9 TER while simultaneously reducing the rule set to 5% of the original size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Syntactic Head Information in Hierarchical Phrase-Based Translation

Chiang’s hierarchical phrase-based (HPB) translation model advances the state-of-the-art in statistical machine translation by expanding conventional phrases to hierarchical phrases – phrases that contain sub-phrases. However, the original HPB model is prone to overgeneration due to lack of linguistic knowledge: the grammar may suggest more derivations than appropriate, many of which may lead t...

متن کامل

Max-Violation Perceptron and Forced Decoding for Scalable MT Training

While large-scale discriminative training has triumphed in many NLP problems, its definite success on machine translation has been largely elusive. Most recent efforts along this line are not scalable (training on the small dev set with features from top ∼100 most frequent words) and overly complicated. We instead present a very simple yet theoretically motivated approach by extending the recen...

متن کامل

A combination of hierarchical systems with forced alignments from phrase-based systems

Currently most state-of-the-art statistical machine translation systems present a mismatch between training and generation conditions. Word alignments are computed using the well known IBM models for single-word based translation. Afterwards phrases are extracted using extraction heuristics, unrelated to the stochastic models applied for finding the word alignment. In the last years, several re...

متن کامل

Search-Aware Tuning for Hierarchical Phrase-based Decoding

Parameter tuning is a key problem for statistical machine translation (SMT). Most popular parameter tuning algorithms for SMT are agnostic of decoding, resulting in parameters vulnerable to search errors in decoding. The recent research of “search-aware tuning” (Liu and Huang, 2014) addresses this problem by considering the partial derivations in every decoding step so that the promising ones a...

متن کامل

Undirected Machine Translation with Discriminative Reinforcement Learning

We present a novel Undirected Machine Translation model of Hierarchical MT that is not constrained to the standard bottomup inference order. Removing the ordering constraint makes it possible to condition on top-down structure and surrounding context. This allows the introduction of a new class of contextual features that are not constrained to condition only on the bottom-up context. The model...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Forced Derivations for Hierarchical Machine Translation

نویسندگان

چکیده

منابع مشابه

Using Syntactic Head Information in Hierarchical Phrase-Based Translation

Max-Violation Perceptron and Forced Decoding for Scalable MT Training

A combination of hierarchical systems with forced alignments from phrase-based systems

Search-Aware Tuning for Hierarchical Phrase-based Decoding

Undirected Machine Translation with Discriminative Reinforcement Learning

عنوان ژورنال:

اشتراک گذاری